Script to summarise and visualise multiple Mutation Annotation Format (MAF) files using maftools R package.
Make sure to set the java max heap size to 2Gb to accomodate big gene tables written into excel spreadsheet using the xlsx R package
##### Set the jave max heap size to 2Gb to accomodate big gene tables
options( java.parameters = "-Xmx2000m" )
suppressMessages(library(knitr))
suppressMessages(library(maftools))
suppressMessages(library(xlsx))
suppressMessages(library(optparse))
suppressMessages(library(DT))
Go to the MAF files directory, read the files and create directory for output files, if does not exist already
##### Read MAF files and put associated info into a list
mafFiles <- params$mafFiles
mafInfo <- vector("list", length(mafFiles))
cohorts.list <- params$cohorts
names(mafInfo) <- cohorts.list
for ( i in 1:length(mafFiles) ) {
mafInfo[[i]] = read.maf(maf = mafFiles[i], verbose = FALSE)
}
## [1] NA
## [1] NA
## [1] NA
## [1] NA
##### Create directory for output files
outDir <- paste(params$mafDir, params$outDir, sep = "/")
if ( !file.exists(params$outDir) ){
dir.create(outDir)
}
Generate tables with basic information about each MAF file, including NCBI build, no. fo samples and genes, no. of different mutation types ( frameshift deletions, frameshift insertions, in-frame deletions, in-frame insertions, missense mutations, nonsense mutations, nonstop mutations, splice site mutations, translation start site mutations), as well as the total no. of mutations present in the MAF file. Individual tables present summary for corresponding datasets.
##### Write overall summary into a file
if ( !file.exists(paste(outDir, "MAF_summary.xlsx", sep = "/")) ){
for ( i in 1:length(mafFiles) ) {
write.xlsx(mafInfo[[i]]@summary, file=paste(outDir, "MAF_summary.xlsx", sep="/"), sheetName=cohorts.list[i], row.names=FALSE, append=TRUE)
}
}
##### Present a MAF file summary table in the html report
##### Create a list for htmlwidgets
widges.list <- htmltools::tagList()
for ( i in 1:length(mafFiles) ) {
widges.list[[i]] <- DT::datatable(data = mafInfo[[i]]@summary, caption = paste0("MAF summary: ", cohorts.list[i]))
}
##### Print a list of htmlwidgets
widges.list
Additionally ceate an excel spreadsheet listing all fields (columns) in the individaul MAF files.
##### Get all fields in MAF files
if ( !file.exists(paste(outDir, "MAF_fields.xlsx", sep = "/")) ){
for ( i in 1:length(mafFiles) ) {
write.xlsx(maftools::getFields(mafInfo[[i]]), file=paste(outDir, "MAF_fields.xlsx", sep="/"), sheetName=cohorts.list[i], row.names=FALSE, append=TRUE, col.names=FALSE)
}
}
Create tables with samples summary. Each table contains per-sample information (rows) about no. of different types of mutations (columns), including frameshift deletions, frameshift insertions, in-frame deletions, in-frame insertions, missense mutations, nonsense mutations, nonstop mutations, splice site mutations, translation start site mutations, as well as the total no. of mutations present in the MAF file.
##### Write samples summary into a file
if ( !file.exists(paste(outDir, "MAF_sample_summary.xlsx", sep = "/")) ) {
for ( i in 1:length(mafFiles) ) {
write.xlsx(maftools::getSampleSummary(mafInfo[[i]]), file=paste(outDir, "MAF_sample_summary.xlsx", sep="/"), sheetName=cohorts.list[i], row.names=FALSE, append=TRUE)
}
}
##### Present a sample table in the html report
##### Create a list for htmlwidgets
widges.list <- htmltools::tagList()
for ( i in 1:length(mafFiles) ) {
widges.list[[i]] <- DT::datatable(data = maftools::getSampleSummary(mafInfo[[i]]), caption = paste0("Samples summary: ", cohorts.list[i]), filter = "top")
}
##### Print a list of htmlwidgets
widges.list
Now present gene summary info. This part creates tables for individual cohorts with per-gene information (rows) about no. of different types of mutations (columns), including frameshift deletions, frameshift insertions, in-frame deletions, in-frame insertions, missense mutations, nonsense mutations, nonstop mutations, splice site mutations, translation start site mutations, as well as the total no. of mutations present in the MAF file. The last two columns contain the no. of samples with mutations/alterations in the corresponding gene.
##### Write gene summary into a file
if ( !file.exists(paste(outDir, "MAF_gene_summary.xlsx", sep = "/")) ){
for ( i in 1:length(mafFiles) ) {
write.xlsx(maftools::getGeneSummary(mafInfo[[i]]), file=paste(outDir, "MAF_gene_summary.xlsx", sep="/"), sheetName=cohorts.list[i], row.names=FALSE, append=TRUE)
}
}
##### Present a gene table in the html report
##### Create a list for htmlwidgets
widges.list <- htmltools::tagList()
for ( i in 1:length(mafFiles) ) {
widges.list[[i]] <- DT::datatable(data = maftools::getGeneSummary(mafInfo[[i]]), caption = paste0("Genes summary: ", cohorts.list[i]), filter = "top")
}
##### Print a list of htmlwidgets
widges.list
Generate an interactive heatmap to facilitate outlier samples detection. Rows and columns represent samples and mutation types, respectively. The colour scale from blue to yellow indicates low and high number of various mutations types, respectively, reported for corresponding samples. Samples are ordered by the number of mutations to facilitate identification of individuals with extreme mutation burden.
suppressMessages(library(plotly))
suppressMessages(library(heatmaply))
##### Create a list for htmlwidgets
widges.list <- htmltools::tagList()
##### Display samples summary in a form of interactive heatmap
for ( i in 1:length(mafFiles) ) {
sampleSummary <- data.frame(maftools::getSampleSummary(mafInfo[[i]]))
rownames(sampleSummary) <-sampleSummary[,"Tumor_Sample_Barcode"]
sampleSummary <- subset(sampleSummary, select=-c(Tumor_Sample_Barcode, total))
##### Generate interactive heatmap
p <- heatmaply(sampleSummary, main = paste0("Samples summary: ", cohorts.list[i]), Rowv=NULL, Colv=NULL, scale="none", dendrogram="none", trace="none", hide_colorbar = FALSE, fontsize_row = 8, label_names=c("Sample","Mutation_type","Count")) %>%
layout(width = 900, height = 600, margin = list(l=150, r=10, b=150, t=50, pad=4), titlefont = list(size=16), xaxis = list(tickfont=list(size=10)), yaxis = list(tickfont=list(size=10)))
##### Add plot to the list for htmlwidgets
widges.list[[i]] <- as_widget(ggplotly(p))
##### Save the heatmap as html (PLOTLY)
htmlwidgets::saveWidget(as_widget(p), paste0(outDir, "/MAF_sample_summary_heatmap_", cohorts.list[i], ".html"), selfcontained = TRUE)
##### Plotly option
#p <- plot_ly(x = colnames(sampleSummary), y = rownames(sampleSummary), z = as.matrix(sampleSummary), height = 600, type = "heatmap") %>%
#layout(title = paste0("Samples summary: ", cohorts.list[i]), autosize = TRUE, margin = list(l=150, r=10, b=100, t=100, pad=4), showlegend = TRUE)
#widges.list[[i]] <- ggplotly(p)
}
##### Detach plotly package. Otherwise it clashes with other graphics devices
detach("package:heatmaply", unload=FALSE)
detach("package:plotly", unload=FALSE)
##### Print a list of htmlwidgets
widges.list
This time generate an interactive heatmap to summmarise genes information. Rows and columns represent genes and mutation types, respectively. The colour scale from blue to yellow indicates low and high number of various mutations types, respectively, observed in corresponding genes. Genes are ordered by the number of reported mutations. The total number of mutations in individual genes, as well as the number of samples with mutations are also presented in the last three columns. Note, for transparency we show only the top 50 mutated genes.
suppressMessages(library(plotly))
suppressMessages(library(heatmaply))
##### Create a list for htmlwidgets
widges.list <- htmltools::tagList()
##### Display genes summary in a form of interactive heatmap
for ( i in 1:length(mafFiles) ) {
geneSummary <- data.frame(maftools::getGeneSummary(mafInfo[[i]])[1:50,])
rownames(geneSummary) <-geneSummary[,"Hugo_Symbol"]
geneSummary <- subset(geneSummary, select=-c(Hugo_Symbol))
##### Generate interactive heatmap
p <- heatmaply(geneSummary, main = paste0("Genes summary: ", cohorts.list[i]), Rowv=NULL, Colv=NULL, scale="none", dendrogram="none", trace="none", hide_colorbar = FALSE, fontsize_row = 8, label_names=c("Gene","Mutation_type","Count")) %>%
layout(width = 900, height = 600, margin = list(l=150, r=10, b=150, t=50, pad=4), titlefont = list(size=16), xaxis = list(tickfont=list(size=10)), yaxis = list(tickfont=list(size=10)))
##### Add plot to the list for htmlwidgets
widges.list[[i]] <- as_widget(ggplotly(p))
##### Save the heatmap as html (PLOTLY)
htmlwidgets::saveWidget(as_widget(p), paste0(outDir, "/MAF_gene_summary_heatmap_", cohorts.list[i], ".html"), selfcontained = TRUE)
##### Plotly option
#p <- plot_ly(x = colnames(geneSummary), y = rownames(geneSummary), z = as.matrix(geneSummary), height = 600, type = "heatmap") %>%
#layout(title = paste0("Genes summary: ", cohorts.list[i]), autosize = TRUE, margin = list(l=150, r=10, b=100, t=100, pad=4), showlegend = TRUE)
#widges.list[[i]] <- ggplotly(p)
}
##### Detach plotly package. Otherwise it clashes with other graphics devices
detach("package:heatmaply", unload=FALSE)
detach("package:plotly", unload=FALSE)
##### Print a list of htmlwidgets
widges.list
This part creates a set of plots summarising individual MAF files.
A summary for MAF file displaying frequency of various mutation/SNV types/classes (top panel), the number of variants in each sample as a stacked bar-plot (bottom-left) and variant types as a box-plot (bottom-middle), as well as the frequency of different mutation types for the top 10 mutated genes (bottom-right). The horizontal dashed line in stacked bar-plot represents median number of variants across the cohort.
###### Generate separate plot for each cohort
for ( i in 1:length(mafFiles) ) {
cat(paste(cohorts.list[i], "cohort\n\n", sep=" "))
##### Plotting MAF summary
par(mar=c(4,4,2,0.5), oma=c(1.5,2,2,1))
maftools::plotmafSummary(maf = mafInfo[[i]], rmOutlier = TRUE, addStat = 'median', dashboard = TRUE, titvRaw = FALSE)
mtext("MAF summary", outer=TRUE, cex=1, line=-0.5)
}
## TCGA-PAAD cohort
## TCGA-PAAD-MC3 cohort
## ICGC-PACA-AU cohort
## ICGC-PACA-AU-additional cohort
## ICGC-PACA-CA cohort
Oncoplot illustrating different types of mutations observed across samples for the 10 most frequently mutated genes. The side and top bar-plots present the frequency of mutations in each gene and in each sample, respectively.
###### Generate separate plot for each cohort
for ( i in 1:length(mafFiles) ) {
cat(paste(cohorts.list[i], "cohort\n\n", sep=" "))
##### Drawing oncoplots for the top 10 genes in each cohort
plot.new()
par(mar=c(4,4,2,0.5), oma=c(1.5,2,2,1))
maftools::oncoplot(maf = mafInfo[[i]], top = 10, fontSize = 12)
}
## TCGA-PAAD cohort
## TCGA-PAAD-MC3 cohort
## ICGC-PACA-AU cohort
## ICGC-PACA-AU-additional cohort
## ICGC-PACA-CA cohort
Plots presenting the transition and transversions distribution. The box-plots (top panel) show the overall distribution of the six different conversions (C>A, C>G, C>T, T>C, T>A and T>G)(left), and the transition and transversions frequency (right). The stacked bar-plot (bottom) displays the fraction of the six different conversions in each sample.
###### Generate separate plot for each cohort
for ( i in 1:length(mafFiles) ) {
cat(paste(cohorts.list[i], "cohort\n\n", sep=" "))
##### Drawing distribution plots of the transitions and transversions
titv.info <- maftools::titv(maf = mafInfo[[i]], plot = FALSE, useSyn = TRUE)
maftools::plotTiTv(res = titv.info)
mtext("Transition and transversions distribution", outer=TRUE, cex=1, line=-1.5)
}
## TCGA-PAAD cohort
## TCGA-PAAD-MC3 cohort
## ICGC-PACA-AU cohort
## ICGC-PACA-AU-additional cohort
## ICGC-PACA-CA cohort
##### Write per-sample transitions and transversions distribution into a file
if ( !file.exists(paste(outDir, "MAF_summary_titv.xlsx", sep = "/")) ) {
for ( i in 1:length(mafFiles) ) {
write.xlsx(titv.info$fraction.contribution, file=paste(outDir, "MAF_summary_titv.xlsx", sep="/"), sheetName=paste0(cohorts.list[i], " (fraction)"), row.names=FALSE, append=TRUE)
write.xlsx(titv.info$raw.counts, file=paste(outDir,"MAF_summary_titv.xlsx", sep="/"), sheetName=paste0(cohorts.list[i], " (count)"), row.names=FALSE, append=TRUE)
write.xlsx(titv.info$TiTv.fractions, file=paste(outDir,"MAF_summary_titv.xlsx", sep="/"), sheetName=paste0(cohorts.list[i], " (TiTv fractions)"), row.names=FALSE, append=TRUE)
}
} else {
cat(paste("\nFile \"MAF_summary_titv.xlsx\" already exists in", outDir, "!\n\n", sep=" "))
}
Perform Fisher Exact test on all genes for all possible pair-wise comprisons to find differentially mutated genes between queried cohorts and to aid identification of global differences in mutation patterns in corresponding MAFs. The results are presented in form of a heatmap with rows and columns representing genes and cohorts, respectively, and heatmap colours indicating the adjusted Fisher Exact test p-values. The colour key is on the left-hand side. Low p-values (yellow) indicate differentially mutated genes between the corresponding cohorts. Genes are clustered to facilite the indetification possible differences between individaul cohorts.
suppressMessages(library(plotly))
suppressMessages(library(heatmaply))
##### Create matrix of possible comparisons
comb <- combn(levels(factor(cohorts.list)), 2)
##### Get number of possible comparisons using the following formula:
#
# n!/((n-r)!(r!))
#
# n = the number of classes to compare
# r = the number of elements for single comparison
#
################################################################################
combNo <- factorial(length(cohorts.list))/(factorial(length(cohorts.list)-2)*(factorial(2))) # n!/((n-r)!(r!))
combNames <- NULL
mafCompare.res <- vector("list", combNo)
##### Perform pair-wise cohorts MAFs comparisons
for (i in 1:combNo) {
combNames[i] <- paste(comb[1,i], comb[2,i], sep=" vs ")
mafCompare.res[[i]] <- maftools::mafCompare(m1 = mafInfo[[comb[1,i]]], m2 = mafInfo[[comb[2,i]]], m1Name = comb[1,i], m2Name = comb[2,i], minMut = 0)$results
##### Sort the data-frame by gene symbol
mafCompare.res[[i]] <- mafCompare.res[[i]][order(mafCompare.res[[i]]$Hugo_Symbol)]
rownames(mafCompare.res[[i]]) <- mafCompare.res[[i]]$Hugo_Symbol
}
##### Extract the intersection of genes for the heatmap
common_genes <- Reduce(intersect, lapply(mafCompare.res, row.names))
mafCompare.res.common <- lapply(mafCompare.res, function(x) { x[row.names(x) %in% common_genes,] })
##### Extract adjusted p-values for each comparison, use genes names fow row names
mafCompare.res.common.p <- do.call(cbind, lapply(mafCompare.res.common, function(x) x[, c("Hugo_Symbol", "adjPval")]))
common_genes <- mafCompare.res.common.p$Hugo_Symbol
mafCompare.res.common.p <- mafCompare.res.common.p[,-c("Hugo_Symbol")]
mafCompare.res.common.p <- as.matrix(mafCompare.res.common.p)
colnames(mafCompare.res.common.p) <- combNames
rownames(mafCompare.res.common.p) <- common_genes
##### Cluster genes
hr <- hclust(as.dist(dist(mafCompare.res.common.p, method="euclidean")), method="ward.D")
##### Generate interactive heatmap
p <- heatmaply(mafCompare.res.common.p, Rowv=as.dendrogram(hr), Colv=NULL, viridis(n=256, alpha = 1, begin = 1, end = 0, option = "viridis"), scale="none", dendrogram="row", trace="none", hide_colorbar = FALSE, fontsize_row = 8, label_names=c("Gene","Comparison","P-value")) %>%
layout(width = 900, height = 900, margin = list(l=150, r=10, b=250, t=50, pad=4), titlefont = list(size=16), xaxis = list(tickfont=list(size=10)), yaxis = list(tickfont=list(size=10)))
as_widget(ggplotly(p))
##### Save the heatmap as html (PLOTLY)
htmlwidgets::saveWidget(as_widget(p), paste0(outDir, "/MAF_pair-wise_comparisons_heatmap.html"), selfcontained = TRUE)
##### Detach plotly package. Otherwise it clashes with other graphics devices
detach("package:heatmaply", unload=FALSE)
detach("package:plotly", unload=FALSE)
Plot illustrating the mutation load in ICGC PACA-CA cohort along distribution of variants compiled from over 10,000 WXS samples across 33 TCGA landmark cohorts. Every dot represents a sample whereas the red horizontal lines are the median numbers of mutations in the respective cancer types. The vertical axis (log scaled) shows the number of mutations per megabase whereas the different cancer types are ordered on the horizontal axis based on their median numbers of somatic mutations. This plot is similar to the one described in the paper “Signatures of mutational processes in human cancer” by Alexandrov et al. (PMID: 23945592)
###### Generate separate plot for each cohort
for ( i in 1:length(mafFiles) ) {
##### Compare mutation load against TCGA cohorts
maftools::tcgaCompare(maf = mafInfo[[i]], cohortName = cohorts.list[i], primarySite=TRUE)
}
## Cohort Cohort_Size Median_Mutations
## 1: Skin 468 315.0
## 2: Lung Squamous 494 187.5
## 3: Lung Adeno 567 158.0
## 4: Bladder 412 131.5
## 5: Esophagus 184 85.0
## 6: Head and Neck 509 82.0
## 7: Stomach 439 82.0
## 8: Lymph Nodes 48 81.5
## 9: Uterus Corpus 542 78.0
## 10: Colorectal Colon 433 76.0
## 11: Ovary 443 72.0
## 12: Liver 374 67.0
## 13: Cervix 305 66.0
## 14: Colorectal Rectum 158 63.0
## 15: Kidney Papilary 288 53.0
## 16: TCGA-PAAD 142 47.0
## 17: Kidney Clear Cell 339 44.0
## 18: Uterus 57 35.0
## 19: Breast 1044 34.0
## 20: Brain Glioblastoma 395 32.0
## 21: Soft Tissue 255 31.0
## 22: Bile Duct 51 30.0
## 23: Pleura 83 25.0
## 24: Pancreas 178 22.5
## 25: Adrenal Gland Carcinoma 92 21.5
## 26: Brain Glioma 511 19.0
## 27: Prostate 496 19.0
## 28: Kidney Chromophobe 66 13.0
## 29: Testis 149 11.0
## 30: Thymus 123 11.0
## 31: Bone Marrow 143 10.0
## 32: Eye 80 10.0
## 33: Thyroid 491 9.0
## 34: Adrenal Gland 177 7.0
## Cohort Cohort_Size Median_Mutations
## Cohort Cohort_Size Median_Mutations
## 1: Skin 468 315.0
## 2: Lung Squamous 494 187.5
## 3: Lung Adeno 567 158.0
## 4: Bladder 412 131.5
## 5: Esophagus 184 85.0
## 6: Head and Neck 509 82.0
## 7: Stomach 439 82.0
## 8: Lymph Nodes 48 81.5
## 9: Uterus Corpus 542 78.0
## 10: Colorectal Colon 433 76.0
## 11: Ovary 443 72.0
## 12: Liver 374 67.0
## 13: Cervix 305 66.0
## 14: Colorectal Rectum 158 63.0
## 15: Kidney Papilary 288 53.0
## 16: Kidney Clear Cell 339 44.0
## 17: Uterus 57 35.0
## 18: Breast 1044 34.0
## 19: TCGA-PAAD-MC3 154 34.0
## 20: Brain Glioblastoma 395 32.0
## 21: Soft Tissue 255 31.0
## 22: Bile Duct 51 30.0
## 23: Pleura 83 25.0
## 24: Pancreas 178 22.5
## 25: Adrenal Gland Carcinoma 92 21.5
## 26: Brain Glioma 511 19.0
## 27: Prostate 496 19.0
## 28: Kidney Chromophobe 66 13.0
## 29: Testis 149 11.0
## 30: Thymus 123 11.0
## 31: Bone Marrow 143 10.0
## 32: Eye 80 10.0
## 33: Thyroid 491 9.0
## 34: Adrenal Gland 177 7.0
## Cohort Cohort_Size Median_Mutations
## Cohort Cohort_Size Median_Mutations
## 1: Skin 468 315.0
## 2: Lung Squamous 494 187.5
## 3: Lung Adeno 567 158.0
## 4: Bladder 412 131.5
## 5: Esophagus 184 85.0
## 6: Head and Neck 509 82.0
## 7: Stomach 439 82.0
## 8: Lymph Nodes 48 81.5
## 9: Uterus Corpus 542 78.0
## 10: Colorectal Colon 433 76.0
## 11: Ovary 443 72.0
## 12: Liver 374 67.0
## 13: Cervix 305 66.0
## 14: Colorectal Rectum 158 63.0
## 15: Kidney Papilary 288 53.0
## 16: Kidney Clear Cell 339 44.0
## 17: Uterus 57 35.0
## 18: Breast 1044 34.0
## 19: Brain Glioblastoma 395 32.0
## 20: Soft Tissue 255 31.0
## 21: Bile Duct 51 30.0
## 22: Pleura 83 25.0
## 23: Pancreas 178 22.5
## 24: Adrenal Gland Carcinoma 92 21.5
## 25: Brain Glioma 511 19.0
## 26: Prostate 496 19.0
## 27: ICGC-PACA-AU 395 18.0
## 28: Kidney Chromophobe 66 13.0
## 29: Testis 149 11.0
## 30: Thymus 123 11.0
## 31: Bone Marrow 143 10.0
## 32: Eye 80 10.0
## 33: Thyroid 491 9.0
## 34: Adrenal Gland 177 7.0
## Cohort Cohort_Size Median_Mutations
## Cohort Cohort_Size Median_Mutations
## 1: Skin 468 315.0
## 2: Lung Squamous 494 187.5
## 3: Lung Adeno 567 158.0
## 4: Bladder 412 131.5
## 5: Esophagus 184 85.0
## 6: Head and Neck 509 82.0
## 7: Stomach 439 82.0
## 8: Lymph Nodes 48 81.5
## 9: Uterus Corpus 542 78.0
## 10: Colorectal Colon 433 76.0
## 11: Ovary 443 72.0
## 12: Liver 374 67.0
## 13: Cervix 305 66.0
## 14: Colorectal Rectum 158 63.0
## 15: Kidney Papilary 288 53.0
## 16: Kidney Clear Cell 339 44.0
## 17: ICGC-PACA-AU-additional 25 40.0
## 18: Uterus 57 35.0
## 19: Breast 1044 34.0
## 20: Brain Glioblastoma 395 32.0
## 21: Soft Tissue 255 31.0
## 22: Bile Duct 51 30.0
## 23: Pleura 83 25.0
## 24: Pancreas 178 22.5
## 25: Adrenal Gland Carcinoma 92 21.5
## 26: Brain Glioma 511 19.0
## 27: Prostate 496 19.0
## 28: Kidney Chromophobe 66 13.0
## 29: Testis 149 11.0
## 30: Thymus 123 11.0
## 31: Bone Marrow 143 10.0
## 32: Eye 80 10.0
## 33: Thyroid 491 9.0
## 34: Adrenal Gland 177 7.0
## Cohort Cohort_Size Median_Mutations
## Cohort Cohort_Size Median_Mutations
## 1: Skin 468 315.0
## 2: Lung Squamous 494 187.5
## 3: Lung Adeno 567 158.0
## 4: Bladder 412 131.5
## 5: Esophagus 184 85.0
## 6: Head and Neck 509 82.0
## 7: Stomach 439 82.0
## 8: Lymph Nodes 48 81.5
## 9: Uterus Corpus 542 78.0
## 10: Colorectal Colon 433 76.0
## 11: Ovary 443 72.0
## 12: Liver 374 67.0
## 13: Cervix 305 66.0
## 14: Colorectal Rectum 158 63.0
## 15: Kidney Papilary 288 53.0
## 16: Kidney Clear Cell 339 44.0
## 17: Uterus 57 35.0
## 18: Breast 1044 34.0
## 19: Brain Glioblastoma 395 32.0
## 20: Soft Tissue 255 31.0
## 21: Bile Duct 51 30.0
## 22: Pleura 83 25.0
## 23: ICGC-PACA-CA 336 25.0
## 24: Pancreas 178 22.5
## 25: Adrenal Gland Carcinoma 92 21.5
## 26: Brain Glioma 511 19.0
## 27: Prostate 496 19.0
## 28: Kidney Chromophobe 66 13.0
## 29: Testis 149 11.0
## 30: Thymus 123 11.0
## 31: Bone Marrow 143 10.0
## 32: Eye 80 10.0
## 33: Thyroid 491 9.0
## 34: Adrenal Gland 177 7.0
## Cohort Cohort_Size Median_Mutations
###### Generate separate file with plots for each cohort
for ( i in 1:length(mafFiles) ) {
pdf( file = paste(outDir, "/MAF_summary_", cohorts.list[i], ".pdf", sep = "") )
##### Plotting MAF summary
par(mar=c(4,4,2,0.5), oma=c(1.5,2,2,1))
plotmafSummary(maf = mafInfo[[i]], rmOutlier = TRUE, addStat = 'median', dashboard = TRUE, titvRaw = FALSE)
mtext("MAF summary", outer=TRUE, cex=1, line=-0.5)
##### Drawing oncoplots for the top 10 genes in each cohort
plot.new()
par(mar=c(4,4,2,0.5), oma=c(1.5,2,2,1))
oncoplot(maf = mafInfo[[i]], top = 10, fontSize = 12)
##### Drawing distribution plots of the transitions and transversions
titv.info = titv(maf = mafInfo[[i]], plot = FALSE, useSyn = TRUE)
plotTiTv(res = titv.info)
mtext("Transition and transversions distribution", outer=TRUE, cex=1, line=-1.5)
##### Compare mutation load against TCGA cohorts
tcgaCompare(maf = mafInfo[[i]], cohortName = cohorts.list[i], primarySite=TRUE)
mtext("Mutation load in TCGA cohorts", outer=TRUE, cex=1, line=-0.5)
dev.off()
}
## Cohort Cohort_Size Median_Mutations
## 1: Skin 468 315.0
## 2: Lung Squamous 494 187.5
## 3: Lung Adeno 567 158.0
## 4: Bladder 412 131.5
## 5: Esophagus 184 85.0
## 6: Head and Neck 509 82.0
## 7: Stomach 439 82.0
## 8: Lymph Nodes 48 81.5
## 9: Uterus Corpus 542 78.0
## 10: Colorectal Colon 433 76.0
## 11: Ovary 443 72.0
## 12: Liver 374 67.0
## 13: Cervix 305 66.0
## 14: Colorectal Rectum 158 63.0
## 15: Kidney Papilary 288 53.0
## 16: TCGA-PAAD 142 47.0
## 17: Kidney Clear Cell 339 44.0
## 18: Uterus 57 35.0
## 19: Breast 1044 34.0
## 20: Brain Glioblastoma 395 32.0
## 21: Soft Tissue 255 31.0
## 22: Bile Duct 51 30.0
## 23: Pleura 83 25.0
## 24: Pancreas 178 22.5
## 25: Adrenal Gland Carcinoma 92 21.5
## 26: Brain Glioma 511 19.0
## 27: Prostate 496 19.0
## 28: Kidney Chromophobe 66 13.0
## 29: Testis 149 11.0
## 30: Thymus 123 11.0
## 31: Bone Marrow 143 10.0
## 32: Eye 80 10.0
## 33: Thyroid 491 9.0
## 34: Adrenal Gland 177 7.0
## Cohort Cohort_Size Median_Mutations
## Cohort Cohort_Size Median_Mutations
## 1: Skin 468 315.0
## 2: Lung Squamous 494 187.5
## 3: Lung Adeno 567 158.0
## 4: Bladder 412 131.5
## 5: Esophagus 184 85.0
## 6: Head and Neck 509 82.0
## 7: Stomach 439 82.0
## 8: Lymph Nodes 48 81.5
## 9: Uterus Corpus 542 78.0
## 10: Colorectal Colon 433 76.0
## 11: Ovary 443 72.0
## 12: Liver 374 67.0
## 13: Cervix 305 66.0
## 14: Colorectal Rectum 158 63.0
## 15: Kidney Papilary 288 53.0
## 16: Kidney Clear Cell 339 44.0
## 17: Uterus 57 35.0
## 18: Breast 1044 34.0
## 19: TCGA-PAAD-MC3 154 34.0
## 20: Brain Glioblastoma 395 32.0
## 21: Soft Tissue 255 31.0
## 22: Bile Duct 51 30.0
## 23: Pleura 83 25.0
## 24: Pancreas 178 22.5
## 25: Adrenal Gland Carcinoma 92 21.5
## 26: Brain Glioma 511 19.0
## 27: Prostate 496 19.0
## 28: Kidney Chromophobe 66 13.0
## 29: Testis 149 11.0
## 30: Thymus 123 11.0
## 31: Bone Marrow 143 10.0
## 32: Eye 80 10.0
## 33: Thyroid 491 9.0
## 34: Adrenal Gland 177 7.0
## Cohort Cohort_Size Median_Mutations
## Cohort Cohort_Size Median_Mutations
## 1: Skin 468 315.0
## 2: Lung Squamous 494 187.5
## 3: Lung Adeno 567 158.0
## 4: Bladder 412 131.5
## 5: Esophagus 184 85.0
## 6: Head and Neck 509 82.0
## 7: Stomach 439 82.0
## 8: Lymph Nodes 48 81.5
## 9: Uterus Corpus 542 78.0
## 10: Colorectal Colon 433 76.0
## 11: Ovary 443 72.0
## 12: Liver 374 67.0
## 13: Cervix 305 66.0
## 14: Colorectal Rectum 158 63.0
## 15: Kidney Papilary 288 53.0
## 16: Kidney Clear Cell 339 44.0
## 17: Uterus 57 35.0
## 18: Breast 1044 34.0
## 19: Brain Glioblastoma 395 32.0
## 20: Soft Tissue 255 31.0
## 21: Bile Duct 51 30.0
## 22: Pleura 83 25.0
## 23: Pancreas 178 22.5
## 24: Adrenal Gland Carcinoma 92 21.5
## 25: Brain Glioma 511 19.0
## 26: Prostate 496 19.0
## 27: ICGC-PACA-AU 395 18.0
## 28: Kidney Chromophobe 66 13.0
## 29: Testis 149 11.0
## 30: Thymus 123 11.0
## 31: Bone Marrow 143 10.0
## 32: Eye 80 10.0
## 33: Thyroid 491 9.0
## 34: Adrenal Gland 177 7.0
## Cohort Cohort_Size Median_Mutations
## Cohort Cohort_Size Median_Mutations
## 1: Skin 468 315.0
## 2: Lung Squamous 494 187.5
## 3: Lung Adeno 567 158.0
## 4: Bladder 412 131.5
## 5: Esophagus 184 85.0
## 6: Head and Neck 509 82.0
## 7: Stomach 439 82.0
## 8: Lymph Nodes 48 81.5
## 9: Uterus Corpus 542 78.0
## 10: Colorectal Colon 433 76.0
## 11: Ovary 443 72.0
## 12: Liver 374 67.0
## 13: Cervix 305 66.0
## 14: Colorectal Rectum 158 63.0
## 15: Kidney Papilary 288 53.0
## 16: Kidney Clear Cell 339 44.0
## 17: ICGC-PACA-AU-additional 25 40.0
## 18: Uterus 57 35.0
## 19: Breast 1044 34.0
## 20: Brain Glioblastoma 395 32.0
## 21: Soft Tissue 255 31.0
## 22: Bile Duct 51 30.0
## 23: Pleura 83 25.0
## 24: Pancreas 178 22.5
## 25: Adrenal Gland Carcinoma 92 21.5
## 26: Brain Glioma 511 19.0
## 27: Prostate 496 19.0
## 28: Kidney Chromophobe 66 13.0
## 29: Testis 149 11.0
## 30: Thymus 123 11.0
## 31: Bone Marrow 143 10.0
## 32: Eye 80 10.0
## 33: Thyroid 491 9.0
## 34: Adrenal Gland 177 7.0
## Cohort Cohort_Size Median_Mutations
## Cohort Cohort_Size Median_Mutations
## 1: Skin 468 315.0
## 2: Lung Squamous 494 187.5
## 3: Lung Adeno 567 158.0
## 4: Bladder 412 131.5
## 5: Esophagus 184 85.0
## 6: Head and Neck 509 82.0
## 7: Stomach 439 82.0
## 8: Lymph Nodes 48 81.5
## 9: Uterus Corpus 542 78.0
## 10: Colorectal Colon 433 76.0
## 11: Ovary 443 72.0
## 12: Liver 374 67.0
## 13: Cervix 305 66.0
## 14: Colorectal Rectum 158 63.0
## 15: Kidney Papilary 288 53.0
## 16: Kidney Clear Cell 339 44.0
## 17: Uterus 57 35.0
## 18: Breast 1044 34.0
## 19: Brain Glioblastoma 395 32.0
## 20: Soft Tissue 255 31.0
## 21: Bile Duct 51 30.0
## 22: Pleura 83 25.0
## 23: ICGC-PACA-CA 336 25.0
## 24: Pancreas 178 22.5
## 25: Adrenal Gland Carcinoma 92 21.5
## 26: Brain Glioma 511 19.0
## 27: Prostate 496 19.0
## 28: Kidney Chromophobe 66 13.0
## 29: Testis 149 11.0
## 30: Thymus 123 11.0
## 31: Bone Marrow 143 10.0
## 32: Eye 80 10.0
## 33: Thyroid 491 9.0
## 34: Adrenal Gland 177 7.0
## Cohort Cohort_Size Median_Mutations
for ( i in 1:length(params) ) {
cat(paste("Parameter: ", names(params)[i], "\nValue: ", paste(unlist(params[i]), collapse = ","), "\n\n", sep=""))
}
## Parameter: mafDir
## Value: /Users/jmarzec/data/mutation
##
## Parameter: mafFiles
## Value: /Users/jmarzec/data/mutation/PAAD.tcga.uuid.curated.somatic.clean.maf,/Users/jmarzec/data/mutation/PAAD.tcga.mc3.somatic.maf,/Users/jmarzec/data/mutation/PACA-AU.icgc.simple_somatic_mutation.maf,/Users/jmarzec/data/mutation/DCC17_PDAC_Not_in_DCC.maf,/Users/jmarzec/data/mutation/PACA-CA.icgc.simple_somatic_mutation.maf
##
## Parameter: cohorts
## Value: TCGA-PAAD,TCGA-PAAD-MC3,ICGC-PACA-AU,ICGC-PACA-AU-additional,ICGC-PACA-CA
##
## Parameter: outDir
## Value: MAF_summary
sessionInfo()
## R version 3.5.0 (2018-04-23)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.4
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] viridis_0.5.1 viridisLite_0.3.0 ggplot2_2.2.1.9000
## [4] DT_0.4 knitr_1.20 optparse_1.4.4
## [7] xlsx_0.5.7 xlsxjars_0.6.1 rJava_0.9-9
## [10] maftools_1.6.07 Biobase_2.40.0 BiocGenerics_0.26.0
##
## loaded via a namespace (and not attached):
## [1] changepoint_2.2.2 backports_1.1.2
## [3] circlize_0.4.3 NMF_0.21.0
## [5] plyr_1.8.4 lazyeval_0.2.1
## [7] splines_3.5.0 BiocParallel_1.14.1
## [9] crosstalk_1.0.0 GenomeInfoDb_1.16.0
## [11] gridBase_0.4-7 digest_0.6.15
## [13] foreach_1.4.4 htmltools_0.3.6
## [15] gdata_2.18.0 magrittr_1.5
## [17] memoise_1.1.0 BSgenome_1.48.0
## [19] cluster_2.0.7-1 doParallel_1.0.11
## [21] gclus_1.3.1 ComplexHeatmap_1.18.0
## [23] Biostrings_2.48.0 wordcloud_2.5
## [25] matrixStats_0.53.1 prettyunits_1.0.2
## [27] colorspace_1.3-2 blob_1.1.1
## [29] ggrepel_0.8.0 dplyr_0.7.5
## [31] RCurl_1.95-4.10 jsonlite_1.5
## [33] bindr_0.1.1 survival_2.42-3
## [35] VariantAnnotation_1.26.0 zoo_1.8-1
## [37] iterators_1.0.9 glue_1.2.0
## [39] registry_0.5 gtable_0.2.0
## [41] zlibbioc_1.26.0 XVector_0.20.0
## [43] webshot_0.5.0 GetoptLong_0.1.6
## [45] DelayedArray_0.6.0 kernlab_0.9-26
## [47] shape_1.4.4 DEoptimR_1.0-8
## [49] prabclus_2.2-6 scales_0.5.0.9000
## [51] mvtnorm_1.0-7 DBI_1.0.0
## [53] rngtools_1.3.1 Rcpp_0.12.17
## [55] xtable_1.8-2 progress_1.1.2
## [57] bit_1.1-13 mclust_5.4
## [59] stats4_3.5.0 htmlwidgets_1.2
## [61] httr_1.3.1 getopt_1.20.2
## [63] gplots_3.0.1 RColorBrewer_1.1-2
## [65] fpc_2.1-11 modeltools_0.2-21
## [67] pkgconfig_2.0.1 XML_3.98-1.11
## [69] flexmix_2.3-14 nnet_7.3-12
## [71] labeling_0.3 tidyselect_0.2.4
## [73] rlang_0.2.0.9001 reshape2_1.4.3
## [75] later_0.7.2 AnnotationDbi_1.42.1
## [77] munsell_0.4.3 tools_3.5.0
## [79] RSQLite_2.1.1 evaluate_0.10.1
## [81] stringr_1.3.1 heatmaply_0.14.1
## [83] yaml_2.1.19 bit64_0.9-7
## [85] robustbase_0.93-0 caTools_1.17.1
## [87] purrr_0.2.4 dendextend_1.8.0
## [89] bindrcpp_0.2.2 whisker_0.3-2
## [91] mime_0.5 slam_0.1-43
## [93] biomaRt_2.36.0 compiler_3.5.0
## [95] plotly_4.7.1.9000 tibble_1.4.2
## [97] stringi_1.2.2 GenomicFeatures_1.32.0
## [99] trimcluster_0.1-2 lattice_0.20-35
## [101] Matrix_1.2-14 pillar_1.2.2
## [103] GlobalOptions_0.0.13 data.table_1.11.2
## [105] cowplot_0.9.2 bitops_1.0-6
## [107] seriation_1.2-3 httpuv_1.4.3
## [109] rtracklayer_1.40.2 GenomicRanges_1.32.3
## [111] R6_2.2.2 TSP_1.1-6
## [113] promises_1.0.1 KernSmooth_2.23-15
## [115] gridExtra_2.3 IRanges_2.14.10
## [117] codetools_0.2-15 gtools_3.5.0
## [119] MASS_7.3-50 assertthat_0.2.0
## [121] SummarizedExperiment_1.10.1 pkgmaker_0.22
## [123] rprojroot_1.3-2 rjson_0.2.19
## [125] withr_2.1.2 GenomicAlignments_1.16.0
## [127] Rsamtools_1.32.0 S4Vectors_0.18.2
## [129] GenomeInfoDbData_1.1.0 diptest_0.75-7
## [131] grid_3.5.0 tidyr_0.8.1
## [133] class_7.3-14 rmarkdown_1.9
## [135] shiny_1.1.0